2022-04-21 07:53:21

Acknowledgements

  • Support from NICHD, NIH/OD, NIMH, & NIDA via R01HD094830-01; NSF via 2032713; the LEGO Foundation; & the Alfred P. Sloan Foundation
  • Karen Adolph, Cathie Tamis-LeMonda, Orit Hertzberg, Tiger Teng

Overview

  • A cautionary tale
  • (Hyper)active Curation
  • Lessons learned

A cautionary tale

(Hyper)active curation

What is it?

  • Embed data curation within data collection workflow
  • Rigorous & thorough quality assurance (QA) during collection
  • Curate data with specific sharing target in mind

Workflow

Collection

Coding (video annotation)

Quality assurance (video)

Data export & cleaning (surveys)

  • Final CSV uploaded to Databrary
  • play_data <- databraryapi::read_csv_data_as_df(session_id = 51539, asset_id = 366382)

Summarized

In-process

  • Full protocol on https://play-project.org
  • Considering migration of protocol to bookdown
  • Individual-level files
    • Including CHAT export of transcripts
  • Clean, aggregate MB-CDI data
  • From data dictionaries to open schemata

Lessons learned

…psychologists tend to treat other peoples’ theories like toothbrushes; no self-respecting individual wants to use anyone else’s.

(Mischel, 2009)

The toothbrush culture undermines the building of a genuinely cumulative science, encouraging more parallel play and solo game playing, rather than building on each other’s directly relevant best work.

(Mischel, 2009)

“…psychologists tend to treat other peoples’ theories data, data management practices, tasks, displays…like toothbrushes…”

Cheeky open and reproducible developmental science advocates who want to create a more cumulative science

We don’t talk about (you know)…

Plan your work; work your plan

  • You have to curate data for yourself, so…
  • Clear to you == (often) clear to others
  • Curate with specific target in mind

  • Automate as much as possible
    • Script, use APIs
    • Exploit the web
    • Consistency is the hobgoblin of little minds key to successful automation!

  • Test workflow at every step/look at data
  • Expect to iterate
  • Don’t make the perfect the enemy of the good
  • Meaningful data management plans increasingly required by funders
  • Make data a first class product

Come PLAY with us!

  • Share best practices
  • Talk about “you know”
  • Let’s solve as-yet-unsolved problems…together
  • No reinventing wheels

Resources

This talk was produced on 2022-04-21 in RStudio using R Markdown and the ioslides framework. The code and materials used to generate the slides may be found at https://github.com/PLAY-behaviorome/2022-04-21-team-sci-cds/. Information about the R Session that produced the code is as follows:

## R version 4.1.2 (2021-11-01)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Monterey 12.3
## 
## Matrix products: default
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets 
## [6] methods   base     
## 
## other attached packages:
## [1] forcats_0.5.1   stringr_1.4.0   dplyr_1.0.8    
## [4] purrr_0.3.4     readr_2.1.1     tidyr_1.1.4    
## [7] tibble_3.1.6    ggplot2_3.3.5   tidyverse_1.3.1
## 
## loaded via a namespace (and not attached):
##  [1] nlme_3.1-153       fs_1.5.2          
##  [3] lubridate_1.8.0    webshot_0.5.2     
##  [5] httr_1.4.2         tools_4.1.2       
##  [7] backports_1.4.1    bslib_0.3.1       
##  [9] utf8_1.2.2         R6_2.5.1          
## [11] DBI_1.1.2          mgcv_1.8-38       
## [13] colorspace_2.0-3   withr_2.5.0       
## [15] processx_3.5.2     mnormt_2.0.2      
## [17] tidyselect_1.1.2   gridExtra_2.3     
## [19] curl_4.3.2         compiler_4.1.2    
## [21] cli_3.2.0          rvest_1.0.2       
## [23] xml2_1.3.3         labeling_0.4.2    
## [25] sass_0.4.1         scales_1.2.0      
## [27] psych_2.1.9        systemfonts_1.0.3 
## [29] digest_0.6.29      rmarkdown_2.13    
## [31] svglite_2.0.0      pkgconfig_2.0.3   
## [33] htmltools_0.5.2    dbplyr_2.1.1      
## [35] fastmap_1.1.0      highr_0.9         
## [37] rlang_1.0.2        readxl_1.3.1      
## [39] keyring_1.3.0      rstudioapi_0.13   
## [41] shiny_1.7.1        jquerylib_0.1.4   
## [43] generics_0.1.2     farver_2.1.0      
## [45] jsonlite_1.8.0     magrittr_2.0.3    
## [47] kableExtra_1.3.4   Matrix_1.3-4      
## [49] Rcpp_1.0.8.3       munsell_0.5.0     
## [51] fansi_1.0.3        lifecycle_1.0.1   
## [53] stringi_1.7.6      likert_1.3.5      
## [55] yaml_2.3.5         plyr_1.8.6        
## [57] grid_4.1.2         parallel_4.1.2    
## [59] promises_1.2.0.1   crayon_1.5.1      
## [61] miniUI_0.1.1.1     lattice_0.20-45   
## [63] haven_2.4.3        splines_4.1.2     
## [65] hms_1.1.1          tmvnsim_1.0-2     
## [67] ps_1.6.0           knitr_1.38        
## [69] pillar_1.7.0       reshape2_1.4.4    
## [71] servr_0.24         reprex_2.0.1      
## [73] glue_1.6.2         pagedown_0.16     
## [75] evaluate_0.15      modelr_0.1.8      
## [77] vctrs_0.4.1        tzdb_0.2.0        
## [79] httpuv_1.6.5       cellranger_1.1.0  
## [81] gtable_0.3.0       assertthat_0.2.1  
## [83] xfun_0.30          ggExtra_0.9       
## [85] mime_0.12          xtable_1.8-4      
## [87] broom_0.7.11       databraryapi_0.2.8
## [89] later_1.3.0        viridisLite_0.4.0 
## [91] websocket_1.4.1    ellipsis_0.3.2

References

DisneyMusicVEVO. (2021, December). We don’t talk about bruno (from “encanto”). Youtube. Retrieved from https://www.youtube.com/watch?v=bvWRMAU6V-c

Mischel, W. (2009). Becoming a cumulative science. APS Observer, 22(1). Retrieved from https://www.psychologicalscience.org/observer/becoming-a-cumulative-science

NYU Health Sciences Library. (2013, November). Data sharing and management snafu in 3 short acts (higher quality). Youtube. Retrieved from https://www.youtube.com/watch?v=66oNv_DJuPc

Soska, K. C., Xu, M., Gonzalez, S. L., Herzberg, O., Tamis-LeMonda, C. S., Gilmore, R. O., & Adolph, K. E. (2021). (Hyper)active data curation: A video case study from behavioral science. Journal of Escience Librarianship, 10(3). https://doi.org/10.7191/jeslib.2021.1208